Tech Arsenal 1

home *** CD-ROM | disk | FTP | other *** search

/ Tech Arsenal 1 / Tech Arsenal (Arsenal Computer).ISO / tek-01 / cljul90.zip / LITERATE.CPP < prev next >

Wrap

Text File | 1990-06-19 | 28KB | 632 lines

Literate C++ Marco S. Hyman uucp: ...!pacbell!dumbcat!marc Next time you're feeling bored ask a group of programmers to define good program documentation -- then duck before every volume of The Art of Computer Programming is thrown your way. The extremes are easy to identify. There is usually at least one in the group insisting that the only documentation ever needed is the source, the whole source, and nothing but the source. At the other end of the spectrum will be the programmer bent under weighty volumes of requirements analysis documents, system design documents, HIPO charts, data flow diagrams, data dictionaries, structure charts, and, of course, source listings. One of the reasons for such diverse opinions is that each programmer is likely to have a different documentation goal. Some programmers want to explain algorithms, others want to show data flow and state transitions. There are also those that do the bare minimum required by the organization they work for. This last group can't see any purpose in writing documentation, so perhaps we should state one: "The purpose of program documentation is to provide enough information for another programmer to understand and maintain the program." If this purpose seems a bit altruistic substitute ``another programmer'' with ``you, after not looking at the program for a year.'' The purpose is just as strong and certainly hits closer to home. But what about C++ class documentation? One of the advantages of C++ and object-oriented programming is that it leads to code re-use. However, a programmer is not likely to re-use code when its function is a mystery. Forcing another programmer to look at your implementation to discover what your code does is not polite. It leads to your code being tossed in favor of code the other guy understands. Code is not re-usable until it's documented. "The purpose of class documentation is to provide enough information for another programmer to use the class and member functions of the class." Two levels of documentation are needed; the first level for users of a class and the second level for maintainers of the class. Class users need something like the pages in your C library reference manual. Class maintainers need to know the algorithms used and WHY the code is the way it is. Both the code and the class documentation will convey WHAT the class does. Literate Programming Don Knuth conceived the idea of programs as works of literature and created ``Literate Programming'' (see sidebar) as a method of explaining to programmers what the computer was to do. In Knuth's implementation a program is written in WEB, a language consisting of both TeX text and Pascal text. The combined text is processed by two programs, TANGLE and WEAVE, to produce both Pascal source code and TeX formatted documentation. There are many advantages of keeping the documentation and the code in the same file. A programmer is more likely to update both when both are on the screen at the same time, thus keeping the program and documentation in sync. It also becomes impossible to loose the documentation (without also losing the source). With the proper tools a pretty printed version of the source can be included in the documentation. Most important, the programmer is encouraged to think about both documentation and code. This usually has a side effect of improving both. To see if the literate programming paradigm can be stretched to include C++ I propose Literate C++ (lc++)*1. As shown in figure 1 an lc++ input file (file.lc) will be used to create both a C++ header file (file.h) and a C++ source file (file.cc). The lc++ input file (file.lc) can also be processed to create a library manual page (file.3) and a class documentation file (file.doc). An lc++ language has been designed and a header and source file extraction program, named lcpp, has been prototyped in awk.*2 They are both described below. My goal is to use the prototype program for a while, refining the lc++ language as it is used to generate the second generation extraction tool in lc++. The next step will be to determine a good format for both types of documentation and the creation of the documentation extraction tools. The lc++ Language The lc++ commands are listed in figure 2. Each command starts with an at sign (@) and currently must be the first token on a line, although this is likely to change. Each lc++ input (.lc) file creates one header file and one source file. If multiple headers are required, for example, then each must have its own lc++ file. One of the purposes of the awk prototype is to determine if this limitation is reasonable. The .lc file contains three sections. The first section consists of all text and commands prior to the @specification command. Text in this section is ignored by both the code and documentation extraction tools. Commands usually found in this section are @title and @copyright. The second section of the file starts with @specification. Code and text in the specification section are used to create the header (.h) file and the library man page (.3) file. The code in this sections defines classes and declares member functions. The final section starts with the @implementation command. This section is used to define the member functions declared in the section sections. The text in this section is free form and used to explain what is being done and why. Code and text in this section create the source (.cc) file and the documentation (.doc) file. @inline commands will cause code to be added to the end of the header file. A description of each command follows. @title: The @title command causes a title, perhaps including version information, to be written to both the .h and the .cc output files. Along with the title is a canned notice that explains that the output files should not be modified directly, but that changes should be applied to the input (.lc) file and lc++ run again to generate new output files. @copyright: The @copyright statement is written to both output files. Comment delimiters must be supplied. This is not done automatically as different authors prefer different commenting styles. The copyright section is also a good place to include a change log, such as that built by RCS or any other version control system you may be using. @code: The @code command enables output. The location of the output, .h file or .cc file, depends upon the current mode (specification or implementation). All lines following the @code line are written to the current file. Output continues until a command that disables output is encountered. An @code is not always required for output. The @copyright command above, for example, enables output by default. @text: The @text command disables .h or .cc output and signifies the beginning of documentation. The text will be written to the library manual page (.3) file when @text is seen in the @specification section. Text will be written to the documentation (.doc) file when seen in the implementation section. Alternating @text and @code commands are often seen as the author goes back and forth between coding and documenting. @specification: The @specification command selects specification mode. In specification mode @code output is written to the .h file. Classes are defined in this mode and class members are declared. @code output is written to the .h file immediately. Other output, such as class definitions and member declarations, are not written until the specification mode ends. Output is not enabled by the @specification command. The mode is ended by end of the lc++ input (.lc) file or by an @implementation command. @class: The @class command starts the definition of a new class. Classes are always output in the order that the @class command is found in the lc++ input (.lc) file. No output is done until the specification section of the .lc file is finished. If circular class definitions are required use a class x; declaration in an @code block before the @class definition. @base: The @base command declares the base classes that make up a class. The syntax of the command is @base @<classname> <base class description>. The @<classname> is optional. By default, an @base command adds a base class to the last class defined with the @class command. Because it may be easier to document related classed by bouncing between them it is possible to add a base class to any previously defined class by using the @<classname> syntax. Example: @class Class1 // defines Class1 @class Class2 // defines Class2 @base virtual public Base2 // adds Base2 as a base class of Class2 @base @Class1 public Base1 // adds Base1 as a base class to Class1 @base private Base2p // adds Base2p as another base class // to Class2 (the last defined class). @public, @protected, and @private: These three command add members to the current class. Like the @base command members can be added to a previous named class by adding an @<classname> after the @public, @protected, or @private command. The text on the command line after the command will be copied into the class definition. Proper C++ syntax must be followed. The text following the command should explain when to use the member and what the member does. Of course, this pertains to member functions much more so that data members. How member functions are implemented is *not* appropriate subject matter here. This is still part of the specification. The implementation could vary many ways and still meet the specification. This text is *not* added to the header file. @requires: The @requires command introduces text that describes caller requirements. That is, if the requirements are not followed than the called function is not required to work. Examples of @requires would be that only positive numbers are passed to a square root function. This command always pertains to the last @public, @protected, or @private command found in the lc++ input (.lc) file. @effects: This command introduces a very brief description of what the member function does, i.e. what is the effects of calling the member function. The description is used to generate class documentation. This command always pertains to the last @public, @protected, or @private command found in the lc++ input (.lc) file. See Abstraction and Specification in Program Development by Liskov and Guttag on specifying procedures by use of a requires and an effects clause. They also use a modifying clause which could be added to lc++. @implementation: The implementation command forces classes defined to be written to the header (.h) file and switches @code output to be written to the source (.cc) file. All output, except for @inline (see below) will now be written to the source file. Text written after the @implementation command should discuss implementation details; more of the *how* than the *why*. @member: The @member command starts the definition of a member function. All lines following the @member command will be copied to the source (.cc) file. The command will be used when the documentation extraction programs are written. In the source extraction program it acts as an @code. @inline: The @inline command adds the lines following the command to the header (.h) file. The member function should have been declared as inline in the specification section of the file. This is not verified, however, and can lead to problems. For this reason future versions of the language will not use this keyword. The AWK extraction program Listing 1 is lcpp, the awk program used to process the lc++ input file. It uses the features of new awk, as described in The AWK Programming Language by Aho, Kernighan, and Weinberger. The program is fairly simple and should be easy to understand. Two arrays, class and className, are used to associate a class name with a class number. The array class returns a class number when indexed by a name. The array className returns a name when indexed by a number. The only tricky bit of coding is in the use of awk's associative arrays to force classes and member functions to be output in the same sequence they were input. The use of the member array illustrates the use. Whenever a new class is defined three entries are added to the member array for the class using the class number classNum, member[classNum, "public"], member[classNum, "protected"], and member[classNum, "private"]. The three entries are initialized to 0. This entry is then used as an index into the array when a member definition occurs. A public member definition would be added at member[classNum, "public", member[classNum, "public"]]. Note that this entry uses three indexes and the third is the current count. The count is incremented after the entry is added. The functions doClass and doMembers use these embedded counts to control printing. A Short Example Listing 2 contains a short example of literate C++. The code doesn't do anything except to illustrate some of the features of the language. Note how descriptive text can be placed anywhere in the file. When processed by awk and lcpp two output files are created. With the input file named test.lc the output files are named test.h (listing 3) and test.cc (listing 4). The command line used to generate these files was awk -f lcpp test.lc but this may vary between operation systems and versions of awk. The definition of Literate C++ is not complete. Non-member functions are not handled and inline member functions must be declared as inline in too many places. Also, little thought has been given to how documentation should be typeset. Documentation requirements are sure to force changes to the definition. With use, this prototype will help show what other changes need to be made to the language. Will literate C++ work? Think of all the programs you've had to learn over the years. Now think of those that have been the easiest to understand. Weren't the ones easiest to understand accompanied by articles in Computer Language, or Dr. Dobbs, or Byte: code and text -- a literate programming style. Marco S. Hyman is a principal engineer, designing and writing software for a company in San Francisco. C++ and object-oriented programming are hobbies he pursues at home. He can be reached via e-mail (UUCP) at ...!pacbell!dumbcat!marc. Bibliography Aho, A.V, B.W. Kernighan, and P.J. Weinberger, The AWK Programming Language, Addison-Wesley, Reading, Mass. (1988). Liskov, B., and J. Guttag, Abstraction and Specification in Program Development, MIT Press, Cambridge, Mass. (1986). *1 Note: By rights this should be called C++WEB or WEB++. I thought of lc++ first and like the name so haven't changed it. *2 Note: Lcpp is written in awk and requires new awk (nawk for old UNIX hands.) I believe the DOS ports of awk are new awk compatible. Sidebar: Literate Programming Literate Programming is the name given by Donald Knuth to a programming language and documentation system built around the idea that a program can be considered a work of literature. It is Knuth's belief that a ``practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.'' These main concerns emphasize the goal of a literate program: explaining to another programmer what the computer is to do. Knuth's literate programming is implemented in WEB, a language that combines the features of two other languages, TeX and PASCAL. WEB programs are descriptions of software systems. A WEB description is processed by two other programs, TANGLE and WEAVE, to produce a PASCAL source file and a TeX input file. When the TeX input file is processed by TeX the output is a ``pretty printed'' version of the program with supporting documentation. WEB files are composed of modules with each module consisting of three parts: TeX explanatory material, definitions (WEB adds simple macros to PASCAL), and PASCAL code. Each module is more or less self-contained and should not be so long that its structure is hidden in its length and complexity. Modules are often a few lines long, they are rarely longer than a page. Other versions of WEB or WEB-like languages are also in use. CWEB is similar to WEB but the output is TeX and C. (This is not to be confused with the WEB2C tool that converts original WEB to C code.) loom is a preprocessor written by Janet Incerpi and Robert Sedgewick and used in preparation of Sedgewick's book Algorithms (Addison-Wesley, Reading, Mass., 1983). The Communications of the ACM has a sometimes column on literate programming moderated by Christopher J. Van Wyk of AT&T Bell Laboratories. See the July 1987, December 1987, December 1988, June 1989, and September 1989 issues. The latest column described the language SPIDER which is used to generate WEBs for other languages. For more information see also: Bently, J., D. Knuth, and D. McIlroy, ``Programming Perls: A Literate Program,'' Communications of the ACM, 29,6 (June 1986), 471-483 Knuth, D., ``Literate Programming,'' Computer Journal, 27,2 (1984), 97-111 Knuth, D., The WEB System of Structured Documentation, Stanford Computer Science Report CS980 (September 1983). Figure 1 .............. . lc++ input . . (file.lc) . .............. | ..................................... | | lcpp awk script some future program | | .............. .................... | | | | ............ ............. ................... .................. C++ . . C++ . . class . . class . . Header . . Source . . use (man page) . . implementation . . (file.h) . . (file.cc) . . (file.3) . . (file.doc) . ............ ............. ................... .................. Figure 2 @title Assign a title to the output files. @copyright Put copyright info in output files. @code Flag the following lines as code to be written to an output file @text Flag the following lines as text that is not to be written to an output file. @specification Start defining a specification. @class Define a new class @base Specify a base class for a previous class definition. @public Specify a public interface to a class @protected Specify a protected interface to a class @private Specify a private interface to a class @requires Specify member function requirements @effects Specify member function effects @implementation Start defining an implementation @inline Define an inline member function @member Define a member function Listing 1 (lcpp) # @(#) lcpp 12feb90 (msh) # function timestamp: outputs the file creation timestamp # this function may not work on non-unix systems function timestamp( file ) { "date" | getline d print "// @(#) " file " created " d > file } # function notice: outputs the title and do not revise # notice for the passed file. function notice( title, file ) { print title > file print "" > file print "// This file generated from the input file " ARGV[1] > file print "// DO NOT REVISE THIS FILE." > file print "// To make revisions modify the original input file." > file print "" > file } # function members: keep track of members by class and type # Entries are kept in the order defined. function members( type ) { $1 = "" if ( $2 ~ /@.*/ ) { classNum = class[ substr($2,2) ]; $2 = "" } else { classNum = classCount } member[classNum,type,member[classNum,type]] = $0 ++member[classNum,type] } # function error: print line number, error message, # and increase error counter function error( msg ) { print "Line " NR ": " msg errors++ } # function doMember: output members of a given type for a given class function doMembers( num, type ) { if ( member[num,type] > 0 ) { print type ":" > hOut for (i = 0; i < member[num,type]; ++i) { print " " member[num,type,i] > hOut } } } # function doClass: outputs a class specification from # the internal class tables function doClass( num ) { # output the class header print "" > hOut printf "class %s", className[num] > hOut # Add any base classes. Output the opening brace. for ( i = 0; i < base[num]; ++i ) { printf "%s", base[num,i] > hOut } print " {" > hOut # output the various members doMembers( num, "public" ) doMembers( num, "protected" ) doMembers( num, "private" ) # terminate the class. print "};" > hOut } # verify the correct number of arguments and build # the name of the output files BEGIN { if (ARGC != 2) { print "usage: " ARGV[0] " -f lcpp file" exit 1 } count = index(ARGV[1],".") if (count == 0) { hOut = ARGV[1] ".h" ccOut = ARGV[1] ".cc" } else { hOut = substr(ARGV[1],1,count) "h" ccOut = substr(ARGV[1],1,count) "cc" } timestamp(hOut); timestamp(ccOut) } # @<anything>: turn off output whenever an @command is found $1 ~ /^@.*/ { outEnabled = 0 } # @title: The title is written to both output files as a comment. # output remains off. $1 == "@title" { $1 = "// title: "; notice( $0, hOut ); notice( $0, ccOut ); next } # @copyright: Output is turned on so the following copyright info # is written to both output files. $1 == "@copyright" { hOutEnabled = 1; ccOutEnabled = 1; outEnabled = 1; next } # @specification: Marker for the start of a specification. # direct output to the header file only, but keep output disabled $1 == "@specification" { hOutEnabled = 1; ccOutEnabled = 0; next } # @text: Disable output (actually done above, just eat the @text) $1 == "@text" { next } # @code: Enable output for the following lines. $1 == "@code" { outEnabled = 1; next } # @class: look for class definition. Verify the class name. # Start storing info in an array entry for the class. $1 == "@class" { if ( NF != 2 ) { error( "invalid class definition" ) } else { if ( $2 in class ) { error( "duplicate class name" ) } else { ++classCount; classNum = classCount class[$2] = classNum; className[classNum] = $2 base[classNum] = 0 member[classNum,"public"] = 0 member[classNum,"protected"] = 0 member[classNum,"private"] = 0 } } next } # @base: define a base for the named class. If not class # named use the last class defined. Add it to the base class # array for the appropriate class. $1 == "@base" { if ( $2 ~ /^@.*/ ) { classNum = class[ substr($2,2) ]; $2 = "" } else { classNum = classCount } if ( classNum ) { $1 = base[classNum] == 0 ? ":" : "," base[classNum,base[classNum]] = $0 ++base[classNum] } else { error( "no class for base definition" ) } next } # keep track of public entries by class. $1 == "@public" { members( "public" ); next } # keep track of protected entries by class. $1 == "@protected" { members( "protected" ); next } # keep track of private entries by class. $1 == "@private" { members( "private" ); next } # process @requires. Ignore for now. $1 == "@requires" { next; } # process @effects. Ignore for now. $1 == "@effects" { next; } # entering the implementation section of the input. Set code output to go # to the cc file after dumping the classes. Output remains off. $1 == "@implementation" { for ( classNum = 1; classNum <= classCount; classNum++ ) { doClass( classNum ) } classCount = 0 hOutEnabled = 0 ccOutEnabled = 1 print "#include \"" hOut "\"" > ccOut next } # member function definition. Enable output to the c file. $1 == "@member" { hOutEnabled = 0; ccOutEnabled = 1; outEnabled = 1; next } # inline member function. Enable output to the h file. $1 == "@inline" { hOutEnabled = 1; ccOutEnabled = 0; outEnabled = 1; next } # check if an invalid @command was given and flag the line number $1 ~ /^@/ { error( "unknown command" ); next } # if output is enabled for the header file write this line out outEnabled == 1 && hOutEnabled == 1 { print $0 > hOut } # if output is enabled for the cc file write this line out outEnabled == 1 && ccOutEnabled == 1 { print $0 > ccOut } END { for ( classNum = 1; classNum <= classCount; classNum++ ) { doClass( classNum ) } close( hOut ); close( ccOut ); if ( errors ) { print errors "error(s) found" exit 1 } else { print "generated " hOut " and " ccOut } } Listing 2 (test.lc) @title Example Program This text does not go in either file. @copyright /* * This class doesn't do anything. */ @text Note: Copyright output is to both files until the next @command @specification Code output is not enabled. If you wish something to be written to the header file you must turn on code generation by using an @code @code #include <stdio.h> @text stdio.h was included above as it is used by one of the inline functions. @class testClass This is where testClass is described. @private int dataMember; This is where dataMember is described. @public inline testClass(); @requires The requirements, if any, of the testClass constructor. @effects The effects of calling the testClass constructor @text General text about the constructor. @public virtual ~testClass(); @implementation Text describing implementation issues. @code // this will be part of the .cc file @text The next function is inline, so it will be added to the header file. This assumes that the function has been declared inline above. @inline testClass::testClass() { @text Text can be added even in the middle of a function. Just use @code to start outputting code again. @code printf( "testClass constructor\n" ); } @text The next function is a member function. @member testClass::~testClass() { // do something here } Listing 3 (test.h) // @(#) test.h created Thu Mar 29 17:56:47 PST 1990 // title: Example Program // This file generated from the input file test.lc // DO NOT REVISE THIS FILE. // To make revisions modify the original input file. /* * This class doesn't do anything. */ #include <stdio.h> class testClass { public: inline testClass(); virtual ~testClass(); private: int dataMember; }; testClass::testClass() { printf( "testClass constructor\n" ); } Listing 4 (test.cc) // @(#) test.cc created Thu Mar 29 17:56:47 PST 1990 // title: Example Program // This file generated from the input file test.lc // DO NOT REVISE THIS FILE. // To make revisions modify the original input file. /* * This class doesn't do anything. */ #include "test.h" // this will be part of the .cc file testClass::~testClass() { // do something here }